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ABSTRACT 

This paper is the second step in the preparation of 
forecasts of occupational and industrial information which will meet 
the needs of the Information System for Vocational Decisions (ISVD) . 
The author discusses the computation routines which need to be 
developed, tested and operationalized toward the goal of combining 
occupational and industrial information and projections, and storing, 
processing and retrieving it. The paper begins with a fairly abstract 
discussion of the terminology and principles to be used. These 
principles are then applied to the collection of information from the 
available sources. (TL) 
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FORECASTING FOR COMPUTER AIDED CAREER DECISIONS: 
PROSPECTS AND PROCEDURES 



Richard M. Durstine 
Graduate School of Education 
Harvard University 



Introduction 

This paper is the second step in the preparation of forecasts 
of occupational and industrial information. It extends and develops 
the ideas of Russell G. Davis ' recent survey of forecasting methodo- 
logy (Reference 1), using forms, procedures and work programs de- 
signed to the needs of the ISVD project. 

The eventual aim, for which these necessary foundations are 
now being laid, is to combine information from diverse sources and 
thus to provide forecasts more complete and comprehensive than are 
now available. This can be done only through carefully designed 
computation routines, which will take some time to develop, test and 
put into operation. The present paper fills the gap between the 
basic methodology (Reference 1) and the working routines. It should 
also serve as a basis for discussion in planning for and preparing 
those routines. 

The following are the goals of this effort: 

a) Ability to collect and absorb in explicit form and with 
minimum distortion any objective statement about the 
future of occupations or industries and their attributes. 
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and to relate such statements to one another. 

b) Specifically, projections of employment, earnings, etc., 
for future years by occupation and by industry, in as 
much detail as the available sources of information permit. 

c) Separate treatment of short and long range projections, 
making use of different sources of information for these 
two classes of projections. 

d) Provision for 

Finding 

Collecting 

Organizing 

Storing 

Processing 

Retrieving 

information in as general a form as possible. 

The underlying attitude throughout is that the limited re- 
sources of ISVD make it presumptuous to prepare new forecasts, ex- 
cept on an experimental basis. The primary job will therefore be 
to assemble and integrate what has been prepared by others, to inter- 
polate where gaps exist, and to identify deficiencies. Experimental 
development of further forecasting capability is possible, but only 
after existing material has been thoroughly tapped. 

The remaining discussion will begin with a fairly abstract dis- 
cussion of the terminology and principles to be used. This will be 
followed by the application of these principles to the collection of 
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information from the available sources. This approach will than be 
shown to be a consistent extension of the methods given in (1). 

Method and Terminology 

The logical constructs and terminology that will be used re- 
peatedly in developing forecasts for career decisions will now be 
explained briefly. If the reader finds this description excessively 
abstract, he can proceed directly to the next section, where the 
presentation is more applied and explicit, and use this earlier ma- 
terial as a reference. 

Information will be identified here in terms of coordinate 
dimensions and content dimensions , where 

a) Coordinate dimensions describe the situation (e.g., in- 
dustry, occupation, time) to which the information refers. 
They can be thought of as the identifying labels on the 
rows and columns of a table or matrix. 

b) Content dimensions describe the nature of the information 
(e.g., population, average earnings, level of employment). 
Clearly, what is a coordinate dimension in one instance 
may be a content dimension in another, so the two are not 
always distinct. In context, however, they will usually 
be distinguishable from one another. 

The dimensions (particularly the coordinate dimensions) can be 
separated into 

a ) Scaled dimensions , to which a numerical scale, either con- 
tinuous or discrete, can be attached. These dimensions 



o 



4 



admit both to values spec! Lc to a particular point on 
the scale, and to averages or totals over intervals of 
the scale. Time and age are such dimensions. 

b) Unsealed dimensions, to which no scale can be attached. These 
dimensions must be broken into exhaustive and mutually exclusive 
categories. Examples are industry and occupation . 

When dimensions are unsealed or when scaled dimensions are treated in 
terms of intervals, the total range of possibilities covered will be 
called the domain of the dimension, and the exhaustive and mutually ex- 
clusive set of categories or intervals that cover the domain will be 
called its partition . A domain can have several distinct partitions, 
of course. 

Information content , when expressed in quantitative terms, can be 
given as : 

a) Total quantity associated with a relevant point, interval, 
or category. 

b) Level (e.g., average value) of the quantity within a category 
or interval. This level will relate to total quantity through 
some measure on the category or interval, (e.g., Wage level 
for an occupation is related to total wages through the number 
of persons pursuing that occupation. Here number of persons 
is the measure, and the individual occupations are the 
categories.) 

c) Fraction of the total quantity in the domain that is contained 
by a category or interval. 

For theoretical work and for abbreviated identification, the fol- 
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Total 

Level 

Fraction 









Quantity 


Change in Quantity 


Rate of Change 




s j 


r j 






T i 


^ 1 


S j 


r j 



where the subscript j refers to the type of content (population, 
earnings, etc.) which must always be clearly identified. Clearly, 
change and rate of change must always be expressed in terms of some 
scaled coordinate dimension. 

Let the domain of a dimension (or of a space spanned by several 
dimensions) be represented by A m and the partitions withinAn by • 
Let the individual cells of the partition be identified by the index 
h . Then 

21 3j(h) = 1 (fractional parts must sum to the whole) 

h 

H s j(h) = JL- r j(h) “ 0 (changes in fractional parts 

h h must cancel out) 

where qj(h), Qj(h), etc. will be used as short forms, where 
q j(h) = q j (P(h)) = qj(P mn (h)) 

QjOO = Qj(P(h)) = q j (P mn (h)) 
etc. 

The shortest form consistent with clarity will usually be used. Also 
Qj(h) = W i(h) Qj(h) 
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where W ^ (h) is the weighting function or measure, mentioned earlier, 
defined on P^, that relates level to total quantity. The index i is 
included to distinguish among such measures. Note that W ^ is itself 
a form of information content, renamed to emphasize the special pur- 
pose it serves here. 

Another relevant general formula is 



qj(h) = Q j (h > 

ZL (Mh) 

h J 



which converts totals to fractional parts that sum to unity over the 
domain ^ m . 

The total of Qj over the domain will be denoced by 

Qj(^nm)* where the subscript n is needed in case there is a differ- 
ence in Q. depending on the partition of ^ m * Then 
J 



- £ "i< h > Qj<M 

and the level of Qj over the entire domain is 

21 WjOOQjOO 

Qj<A mn ) 

£ W iOO 



So ^ W ^(h) plays the same role in relating Qj(^ mn ) to Q^( 
as did W ^h) between (L(h) and Q(h). 
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Collection and Preparation of Information 

For applied work, domains and partitions must be identified 
in terms of specific individual coordinate dimensions, from which 
more complex domains and partitions can be constructed as needed. 

Information will be gathered from a variety of sources, 
stored in a consistent manner, and combined to be used, insofar as 
possible, as an integrated whole. Some of the more fruitful of 
these sources are shown in References 2 through 4. 

The procedure to be followed in gathering, preparing, and 
treating this data will be as follows: 

a) Survey available information sources. To this end, pro- 
cedures for both preliminary and detailed surveys must be devised. 
These procedures will be outlined later in this memorandum. 

b) Collect and store this information. 

c) Devise routines for its manipulation, and in particular 
for improvising information that is not directly available from the 
original sources. 

The intent is to have a structure able to treat a broad range 
of information. We thus need a knowledge of which information is, or 
is likely soon to be, available. Experimentation with small segments 
of this information will serve to test out the structure. Subsequent 
collection, inclusion, and use of information will' depend on its 
availability and on the needs of the ISVD project. The goal is a 
working tool that can then be used and progressively developed. We 
seek a living, growing organism, not a closed, static data base. 
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In this sense the job can never be finished, but our results should 
be all the more valuable because of this trait. 

The following paragraphs set forth a program for developing 
this information gathering capability, with example procedures. 

Tentative Work Schedule 

A comprehensive survey of sources of information and their sub- 
sequent development into the proposed system for information treatment 
will involve several steps. These are listed below in terms of 
approximate sequence, and type of personnel who would be the principal 

participants. 



Professional Personnel 


Clerical Personnel 


1. 


Preliminary survey of 
information sources 






3. 


Preparation of formulas 


2. 


Full survey of information 
sources 


4. 


Preparation of computation 
routines 


5. 


Ongoing assembly, punching, 
and verification of data 


6 . 


Ongoing experimentation 
and development 







Procedures for Collection of Information 

The preliminary and full surveys of information sources will be 
described here in terms of the forms to be used for these surveys 
and in terms of example dimensions, domains, and partitions. 

Forms for .the. preliminary information survey are as follows: 

1. Source List (see Exhibit 1) 

2. Catalog of Content Types (see Exhibit 2) 
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3 Simplified Catalog of Domains (see Exhibit 3) 

4. Simplified Catalog of Partitions (see Exhibit 4) 

5. Preliminary Source Survey (see Exhibit 5) 



Source Number 


Title 


Location 


1 


Projections to the Years 1976 


ISVD/CSED Library 




and 2000: Economic Growth, 

Population, Labor Force and 
Leisure, and Transportation. 
Outdoor Recreation Review 
Commission Study Report 23, 
1962. 


(30-30-43-F) 


2 


America's Industrial and Occu- 
pational Manpower Requirements , 
etc. 

Exhibit 1 





Example Form for Source List (for both preliminary and full surveys) 

In the Source List of Exhibit 1, each source of information is 
assigned a number) identified by title, etc., and the location of a 
copy of the source is indicated. In the example shown above, 
a library number related to the project collection is used. 
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Form of Contents 


Serial Number 
(Content type) 


Description 


Q 


1 


Population (persons) 


Q 


2 


Population (households) 


q 


1 


Proportion of populatioi 
(persons) 


R 


1 


Annual rate of change o 
family income 


q 


2 


Proportion of total 
income 



Exhibit 2 



Example Form for Catalog of Content Categories 
(for both preliminary and full surveys) 



A Catalog of Content Categories (Exhibit 2) is needed to keep 
track of the quantities that are being included as content dimensions) 
to insure consistency of notation and to avoid repetition. Designa- 
tion of the form of the contents here is consistent with the scheme 
suggested earlier in this paper (i.e., Q for total quantities, 
q for fractions of the whole, etc.). The serial number specifies to 
what type of information the content refers. There need be no system 
to the assignment of serial numbers, since they are used for identifi- 

I 

cation only. 
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Domain Number 


Description 


A1 


All ages 


LI 


All locations including armed 
forces overseas 


L2 


All locations, not including 
armed forces overseas 


11 


All industries 


12 


All manufacturing industries 



Exhibit 3 



Example Form for Simplified Catalog of Domains 

Domains, as suggested in Exhibit 3, will be identified by a 
letter code, indicating dimension, and a serial number. Again there 
need be no system to the assignment of serial numbers. Likely codes 
for the various dimensions are: 

A : Age L : Location 

I : Industry E : Earnings 

v - 

$ : Occupation T : Time 



Partition Number 


Description 


A 10 


No partitioning of A1 


A 11 


Partition of A1 in 5 year 
segments 


L 20 


No partitioning of L2 


L 21 


Partition of L2 by states 


I 10 


No partitioning of 11 






(Continued on next page) 
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Partition Number 


Description 


I 11 


Partition of 11 by 2 digit 
SIC Categories 


I 12 


Partition of 11 by 1 digit 
1960 census categories 


0 10 


No partitioning of 01 


.0 11 


Partition of 01 by 1 digit 
1960 census categories 


0 12 


Partition of 01 by 2 digit 
1960 census categories 



Exhibit 4 

Example Form for Simplified Catalog of Partitions 

The letter and first digit of the partition code (Exhibit 4) 
are the same as for the domain that includes the partition. The 
code "0" will be used to mean no partition of the domain. Catalogs 
of domains and partitions used with the full) detailed source survey 
will be similar, but stated with greater precision and detail. 



Source 


Pages 


Content Types 


Domain Types 


Partition Types 


1 


17-34, 


Ql> Q2> Q4» 


A1 , 6 , 5 


All, 12, 61, 50, 52 




50,72-91, 


Si, 


L2,9 


L20,95 




112-114, 


r 5’ 


01,2 


010,21,22 




125,127, 


^1 *<12**16 


11 


110,11,12 




129 

* • 
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Example Form for Preliminary Source Survey 
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The Preliminary Source Survey (Exhibit 5) summarizes in non- 
detailed form the contents of the listed information sources. The 
purpose is to give a concise survey of the contents and the partitioning 
of the information in each source. These surveys will be used as a 
reference in making the full source survey, in preparing for computa- 
tions, and in locating deficiencies in the information supply. Domain 
types and partition types need not both be given on the source survey 
sheet. 

For the full survey of information sources, to follow and 
elaborate on the preliminary survey, the following forms will be used: 

1- Source List (same as for the preliminary survey, see Exhibit 1) 

2. Catalog of Content Types (same as for the preliminary survey, see 
Exhibit 2) 

3. Full Catalog of Domains (like that for the preliminary survey, 
but expressed in more detail and with more precision, see Ex- 
hibit 3) 

4. Full Catalog of Partitions (like that for the preliminary survey, 
but expressed in more detail and with more precision, see Ex- 
hibit 4) 

5. Full Source Survey (see Exhibit 6) 

6. Survey Record (see Exhibit 7) 
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Source 


Item Number 


Content Type 


Units 


, Domain 


Partition 


1 


1 


Q1 


Thousands 


Al, LI, 


All, L10, 








of persons 


01, 12 


011, 120 


1 


2 


Q2 


Thousands 


Al, L2, 


All, L21, 








of house- 
holds 

» 


01, 12, 


011, 123 



Exhibit 6 



Example Form for Full Source Survey 



The form for the Full Source Survey (Exhibit 6) is similar to that of 
Exhibit 5 for the Preliminary Source Survey , except in the following 
points: 

1. Each occurrence of information is listed separately. 

2. Full information about the domain and partition of each occurrence 
is given. 

3. Units in which the information is expressed are specified. 



Source 


Item Numbers 


Pages 


Checked 


Completion 


1 


1-10 


1-50, 


75, 81-93 


All tables, 
pages 1-5, those 
marked on other 
pages 


2 


1-14 


All 




All tables in- 
cluded 


3 




All 


1 


This source con- 
tains no relevant 
information 



Exhibit 7 

Example of Survey Record 



The Survey Record (Exhibit 7) serves to record the degree to 
which each source has been canvassed by the Full Source Survey . The 
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pages in the source which have been reviewed are indicated, along with 
the items that have been collected from these pages. A statement of 
the degree to which the source has been examined and to which its 
contents have been noted is also given. A similar form could apply 
later on to the coding and punching of information. 



The most frequently occurring dimensions will probably be loca- 
tion, occupation, and industry. A sampling of some of the domains and 
partitions of these dimensions likely to be used is shown below. 



Location 



Domains 



Partitions 



Massachusetts 



New England 



Full United States 





By County 



By State 



By Region 



Occupations 



Domains 



Partitions 



Engineering 



All Civilian 



Professional 



All 





U.S. Census 1960, 2 digits 



U.S. Census 1960, 1 digit 



U.S. Census 1950, 1 digit 



U.S. Census 1950, 2 digits 



Dictionary of Occupational 
Titles 




ERIC 



Skilled 



Industries 



Domains 



Partitions 



All non-Farm 



All Manufacturing 



All Service 



All 





U.S. Census 1960, 1 digit 



U.S. Census 1950, 1 digit 



Standard Industrial Classi 
fication 



U.S. Census 1950, 2 digits 



U.S. Census 1960, 2 digits 



Dictionary of- Occupational 
Titles 



Many small variations in partitions will occur, and for proper 
processing must be made compatible or included in separate listings, 
whichever is appropriate. 

Preparation of Formulas and Routines 
for Computation 

The information collection procedures outlined earlier are the 
first step in combining the contents of individual sources of informa- 
tion to make a whole that is greater, in terms of the understanding 
it provides, than the sum of its parts. To this general end the fol- 
lowing must be possible with regard to whatever information is col- 
lected: 

a) to fill gaps in individual content categories through 



- interpolation 



- extrapolation 
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b) to condense and summarize information about individual content 
categories through 

- averaging 

- summing 

c) to establish relationships among content categories in order to 
help fill gaps and to construct whatever new categories may prove 
useful 

Additional computational capabilities that should also be included are: 

- Statistical procedures to suitably combine conflicting or 
overlapping information 

- Translation among partitions or domains 

- Discounting procedures 

- Normalization tb satisfy constraints 

- Derivation of new partitions from sets of old partitions 

- Projections in terms of expected effects (e.g., technological 
change, urbanization) as a' modification to purely extrapolative 
methods. 

To illustrate that this prospectus includes within it the 
capabilities that have already been proposed, the methods of Refer- 
ence 1, the first ISVD technical paper on forecasting, are shown 
schematically below in terms of the content categories discussed here. 
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Method 1 

Q 1 (1980) 



where Is employment specified by Industry and occupation. 



Ql(1966) 



The above diagram Indicates direct translation from 1966 to 1980 
without use of further information. The equality relation in Refer- 
ence 1 is a special case of this. 



Method 2 



| Qi(1966) 



Q 2 (1966) 



j fq 2 (1980) 






n (IQAfi'k 


a Q 1 M980) 







where the newly introduced quantities are defined as follows. 

Q 2 is total employment 

q^ is distribution of employment among occupations and industries. 



Method 2A 



Q 3 (1966) 



Q 3 (1980) 



94(1966) j . 



Q4(1980) 



Q 5 (1980) 



where the newly introduced quantities are defined as follows. 
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Qg is output by industry 

is output per worker 
q 5 is employment by industry. 



Method 2B 



Q 5 (1966) 



j] R, (1966-80) 



| Q s (1980) | 1 



Q 6 (1966) I 



4 Qa(1980) 



where the newly introduced quantities are defined as follows. 
R 5 is rate of growth 1966-80, nationally 
Qg is regional or local employment by industry 



Method 3A 




y. 
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Method 3B 



17096671 


i 0,(1966) ( 


Q 9 (1980) l — 








<11 (1966) I 


_ “0 a 1 (Q 3 ) | , 






1 3 q, (I960) 


(17096671 


y Q,(1980> | — 1 



3 Qi (1W) | 



where like industries are grouped together, and 

is a coefficient relating employment mix and output within 
each group of similar industries. 



Methods 4 and 5 are examples of adjustment or normalization of 
forecasts in terms of the results of other, related forecasts. They 
will not be discussed further here. 

A comprehensive set of relations and formulas must be compiled 
and checked out, sometimes with alternate formulas for a given purpose. 

A list of these formulas will then be prepared, along with a body of 
rules for their use, including: 

- terms and conditions of use 

- form and type of input information needed 

- form, nature, and possible use of the output, including an 
evaluation of its likely quality (e.g., accuracy, bias). 

On this base a set of computer routines will be developed, to eventually 
constitute a specialized computer language to handle occupational and 
industrial forecasts. 

To test and use this information system, not all available data 
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would be introduced at once. It would be coded in stages, as needed, 
and these would become part of the total supply o£ coded input data, 
to be combined and experimented with on an ongoing basis. 

The following points, while not central to the above discussion, 
should also be kept in mind: 

- We will want to allow for and include non-quantitative infor- 
mation, not for computation but for simple storage and recall. 

- There should be a special information gathering and treatment 
program for short term information, using as sources 

- Job orders and similar local sources, 

- Newspaper advertisements, 

- Bureau of employment security materials, 

- Employment/unemployment figures. 

- Computer output should be made to include statements of sources, 
of input quantities used, and of formulas used, to aid in 
checking the results. 

- Computer output should indicate the information gaps found 

in trying to do computations, as an alert that these gaps will 
need to be filled. 

There will be many lessons learned as the procedures suggested 
here are put to practical use. The present discussion is meant to be 
a point of departure, and should in no way limit the range or scope 
of future activity. Some important possible extensions of capability 
in forecasting are mentioned at the end of Reference 1. Of the many 
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possible directions that this work can legitimately take, the most 
easy to attain generally should be undertaken first. The more diff- 
cult ones will come after a foundation has been laid and the details 
of the procedures and analysis are better understood. 
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